21 research outputs found
Recommended from our members
Relating dominance formalisms
We establish for the first time a formal relationship between dominance graphs, used for modeling semantics, and grammar formalisms with underspecified dominance links, used for modeling syntax. We present a translation of normal dominance graphs into Unordered Vector Grammars with Dominance Links (UVG-DL) and prove that the configurations of the dominance graph correspond to the derivation trees of the grammar. Moreover, the standard algorithms for both formalisms compute isomorphic charts
Recommended from our members
Grammar Approximation by Representative Sublanguage: A New Model for Language Learning
We propose a new language learning model that learns a syntactic-semantic grammar from a small number of natural language strings annotated with their semantics, along with basic assumptions about natural language syntax. We show that the search space for grammar induction is a complete gram- mar lattice, which guarantees the uniqueness of the learned grammar
Recommended from our members
MADA+TOKAN Manual
MADA1 is a system for Morphological Analysis and Disambiguation for Arabic. TOKAN is a general tokenizer for MADA-disambigauted text. Internally, MADA also makes use of ALMORGEANA, an Arabic lexeme-based morphology analyzer
Arabic Diacritization through Full Morphological Tagging
We present a diacritization system for written Arabic which is based on a lexical resource. It combines a tagger and a lexeme language model. It improves on the best results reported in the literature
Recommended from our members
VigNet: Grounding Language in Graphics using Frame Semantics
This paper introduces Vignette Semantics, a lexical semantic theory based on Frame Semantics that represents conceptual and graphical relations. We also describe a lexical resource that implements this theory, VigNet, and its application in text-to-scene generation
Recommended from our members
Frame Semantics in Text-to-Scene Generation
3D graphics scenes are difficult to create, requiring users to learn and utilize a series of complex menus, dialog boxes, and often tedious direct manipulation techniques. By giving up some amount of control afforded by such interfaces we have found that users can use natural language to quickly and easily create a wide variety of 3D scenes. Natural language offers an interface that is intuitive and immediately accessible by anyone, without requiring any special skill or training. The WordsEye system (http://www.wordseye.com) has been used by several thousand users on the web to create over 10,000 scenes. The system relies on a large database of 3D models and poses to depict entities and actions. We describe how the current version of the system incorporates the type of lexical and real-world knowledge needed to depict scenes from language
Recommended from our members
Collecting Spatial Information for Locations in a Text-to-Scene Conversion System
We investigate using Amazon Mechanical Turk (AMT) for building a low-level description corpus and populating VigNet, a comprehensive semantic resource that we will use in a text-to-scene generation system. To depict a picture of a location, VigNet should contain the knowledge about the typical objects in that location and the arrangements of those objects. Such information is mostly common-sense knowledge that is taken for granted by human beings and is not stated in existing lexical resources and in text corpora. In this paper we focus on collecting objects of locations using AMT. Our results show that it is a promising approach
Recommended from our members
Conventional Orthography for Dialectal Arabic (CODA): Principles and Guidelines -- Egyptian Arabic - Version 0.7 - March 2012
This document introduces CODA (Conventional Orthography for Dialectal Arabic) and presents specifications and detailed guidelines for Egyptian Arabic CODA. CODA addresses the problem of inconsistent orthographic choices in raw (naturally occurring) written dialectal Arabic text. The specifications are a succinct summary, while the guidelines contain details and examples. The document has three parts that are ordered from most general to the more specific. In Part 1, we define CODA and present its general goals, principles and considerations in a non-dialect specific manner. In Part 2, we present a high level CODA specification for Egyptian Arabic (EGY). And in Part 3, we present detailed guidelines for EGY CODA
Recommended from our members
Parsing Arabic Dialects
The Arabic language is a collection of spoken dialects with important phonological, morphological, lexical, and syntactic differences, along with a standard written language, Modern Standard Arabic (MSA). Since the spoken dialects are not officially written, it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper, we address the problem of parsing transcribed spoken Levantine Arabic (LA). We do not assume the existence of any annotated LA corpus (except for development and testing), nor of a parallel corpus LA-MSA. Instead, we use explicit knowledge about the relation between LA and MSA